智能论文笔记

An End-to-End OCR Framework for Robust Arabic-Handwriting Recognition using a Novel Transformers-based Model and an Innovative 270 Million-Words Multi-Font Corpus of Classical Arabic with Diacritics

Aly Mostafa , Omar Mohamed , Ali Ashraf , Ahmed Elbehery , Salma Jamal , Anas Salah , Amr S. Ghoneim

分类：计算机视觉 | 自然语言处理 | 机器学习

2022-08-20

这项研究是有关阿拉伯历史文档的光学特征识别（OCR）的一系列研究的第二阶段，并研究了不同的建模程序如何与问题相互作用。第一项研究研究了变压器对我们定制的阿拉伯数据集的影响。首次研究的弊端之一是训练数据的规模，由于缺乏资源，我们的3000万张图像中仅15000张图像。另外，我们添加了一个图像增强层，时间和空间优化和后校正层，以帮助该模型预测正确的上下文。值得注意的是，我们提出了一种使用视觉变压器作为编码器的端到端文本识别方法，即BEIT和Vanilla Transformer作为解码器，消除了CNNs以进行特征提取并降低模型的复杂性。实验表明，我们的端到端模型优于卷积骨架。该模型的CER为4.46％。

translated by 谷歌翻译

Physics-informed Neural Networks with Periodic Activation Functions for Solute Transport in Heterogeneous Porous Media

Salah A Faroughi , Pingki Datta , Seyed Kourosh Mahjour , Shirko Faroughi

分类：机器学习

2022-12-17

Solute transport in porous media is relevant to a wide range of applications in hydrogeology, geothermal energy, underground CO2 storage, and a variety of chemical engineering systems. Due to the complexity of solute transport in heterogeneous porous media, traditional solvers require high resolution meshing and are therefore expensive computationally. This study explores the application of a mesh-free method based on deep learning to accelerate the simulation of solute transport. We employ Physics-informed Neural Networks (PiNN) to solve solute transport problems in homogeneous and heterogeneous porous media governed by the advection-dispersion equation. Unlike traditional neural networks that learn from large training datasets, PiNNs only leverage the strong form mathematical models to simultaneously solve for multiple dependent or independent field variables (e.g., pressure and solute concentration fields). In this study, we construct PiNN using a periodic activation function to better represent the complex physical signals (i.e., pressure) and their derivatives (i.e., velocity). Several case studies are designed with the intention of investigating the proposed PiNN's capability to handle different degrees of complexity. A manual hyperparameter tuning method is used to find the best PiNN architecture for each test case. Point-wise error and mean square error (MSE) measures are employed to assess the performance of PiNNs' predictions against the ground truth solutions obtained analytically or numerically using the finite element method. Our findings show that the predictions of PiNN are in good agreement with the ground truth solutions while reducing computational complexity and cost by, at least, three orders of magnitude.

translated by 谷歌翻译

On Evaluating Adversarial Robustness of Chest X-ray Classification: Pitfalls and Best Practices

Salah Ghamizi , Maxime Cordy , Michail Papadakis , Yves Le Traon

分类：计算机视觉 | 机器学习

2022-12-15

Vulnerability to adversarial attacks is a well-known weakness of Deep Neural Networks. While most of the studies focus on natural images with standardized benchmarks like ImageNet and CIFAR, little research has considered real world applications, in particular in the medical domain. Our research shows that, contrary to previous claims, robustness of chest x-ray classification is much harder to evaluate and leads to very different assessments based on the dataset, the architecture and robustness metric. We argue that previous studies did not take into account the peculiarity of medical diagnosis, like the co-occurrence of diseases, the disagreement of labellers (domain experts), the threat model of the attacks and the risk implications for each successful attack. In this paper, we discuss the methodological foundations, review the pitfalls and best practices, and suggest new methodological considerations for evaluating the robustness of chest xray classification models. Our evaluation on 3 datasets, 7 models, and 18 diseases is the largest evaluation of robustness of chest x-ray classification models.

translated by 谷歌翻译

Reinforcement Learning in System Identification

Jose Antonio Martin H. , Oscar Fernandez Vicente , Sergio Perez , Anas Belfadil , Cristina Ibanez-Llano , Freddy Jose Perozo Rondon , Jose Javier Valle , Javier Arechalde Pelaz

分类：机器学习 | 人工智能

2022-12-14

System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering in different fields. Particularly, it is a recurring theme in Reinforcement Learning research, where forward models approximate the state transition function of a Markov Decision Process by learning a mapping function from current state and action to the next state. This problem is commonly defined as a Supervised Learning problem in a direct way. This common approach faces several difficulties due to the inherent complexities of the dynamics to learn, for example, delayed effects, high non-linearity, non-stationarity, partial observability and, more important, error accumulation when using bootstrapped predictions (predictions based on past predictions), over large time horizons. Here we explore the use of Reinforcement Learning in this problem. We elaborate on why and how this problem fits naturally and sound as a Reinforcement Learning problem, and present some experimental results that demonstrate RL is a promising technique to solve these kind of problems.

translated by 谷歌翻译

A Grid-based Sensor Floor Platform for Robot Localization using Machine Learning

Anas Gouda , Danny Heinrich , Mirco Hünnefeld , Irfan Fachrudin Priyanta , Christopher Reining , Moritz Roidl

分类：机器学习 | 机器人

2022-12-09

Wireless Sensor Network (WSN) applications reshape the trend of warehouse monitoring systems allowing them to track and locate massive numbers of logistic entities in real-time. To support the tasks, classic Radio Frequency (RF)-based localization approaches (e.g. triangulation and trilateration) confront challenges due to multi-path fading and signal loss in noisy warehouse environment. In this paper, we investigate machine learning methods using a new grid-based WSN platform called Sensor Floor that can overcome the issues. Sensor Floor consists of 345 nodes installed across the floor of our logistic research hall with dual-band RF and Inertial Measurement Unit (IMU) sensors. Our goal is to localize all logistic entities, for this study we use a mobile robot. We record distributed sensing measurements of Received Signal Strength Indicator (RSSI) and IMU values as the dataset and position tracking from Vicon system as the ground truth. The asynchronous collected data is pre-processed and trained using Random Forest and Convolutional Neural Network (CNN). The CNN model with regularization outperforms the Random Forest in terms of localization accuracy with aproximate 15 cm. Moreover, the CNN architecture can be configured flexibly depending on the scenario in the warehouse. The hardware, software and the CNN architecture of the Sensor Floor are open-source under https://github.com/FLW-TUDO/sensorfloor.

translated by 谷歌翻译

Brain Tumor Synthetic Data Generation with Adaptive StyleGANs

Usama Tariq , Rizwan Qureshi , Anas Zafar , Danyal Aftab , Jia Wu , Tanvir Alam , Zubair Shah , Hazrat Ali

分类：计算机视觉 | 机器学习

2022-12-04

Generative models have been very successful over the years and have received significant attention for synthetic data generation. As deep learning models are getting more and more complex, they require large amounts of data to perform accurately. In medical image analysis, such generative models play a crucial role as the available data is limited due to challenges related to data privacy, lack of data diversity, or uneven data distributions. In this paper, we present a method to generate brain tumor MRI images using generative adversarial networks. We have utilized StyleGAN2 with ADA methodology to generate high-quality brain MRI with tumors while using a significantly smaller amount of training data when compared to the existing approaches. We use three pre-trained models for transfer learning. Results demonstrate that the proposed method can learn the distributions of brain tumors. Furthermore, the model can generate high-quality synthetic brain MRI with a tumor that can limit the small sample size issues. The approach can addresses the limited data availability by generating realistic-looking brain MRI with tumors. The code is available at: ~\url{https://github.com/rizwanqureshi123/Brain-Tumor-Synthetic-Data}.

translated by 谷歌翻译

New Results for the Text Recognition of Arabic Maghrib{ī} Manuscripts -- Managing an Under-resourced Script

Lucas Noëmie , Clément Salah , Chahan Vidal-Gorène

分类：计算机视觉

2022-11-29

HTR models development has become a conventional step for digital humanities projects. The performance of these models, often quite high, relies on manual transcription and numerous handwritten documents. Although the method has proven successful for Latin scripts, a similar amount of data is not yet achievable for scripts considered poorly-endowed, like Arabic scripts. In that respect, we are introducing and assessing a new modus operandi for HTR models development and fine-tuning dedicated to the Arabic Maghrib{\=i} scripts. The comparison between several state-of-the-art HTR demonstrates the relevance of a word-based neural approach specialized for Arabic, capable to achieve an error rate below 5% with only 10 pages manually transcribed. These results open new perspectives for Arabic scripts processing and more generally for poorly-endowed languages processing. This research is part of the development of RASAM dataset in partnership with the GIS MOMM and the BULAC.

translated by 谷歌翻译

A Survey on Computer Vision based Human Analysis in the COVID-19 Era

Fevziye Irem Eyiokur , Alperen Kantarcı , Mustafa Ekrem Erakın , Naser Damer , Ferda Ofli , Muhammad Imran , Janez Križaj , Albert Ali Salah , Alexander Waibel , Vitomir Štruc

分类：计算机视觉

2022-11-07

The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks. Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given.

translated by 谷歌翻译

Lidar-level localization with radar? The CFEAR approach to accurate, fast and robust large-scale radar odometry in diverse environments

Daniel Adolfsson , Martin Magnusson , Anas Alhashimi , Achim J. Lilienthal , Henrik Andreasson

分类：机器人

2022-11-04

This paper presents an accurate, highly efficient, and learning-free method for large-scale odometry estimation using spinning radar, empirically found to generalize well across very diverse environments -- outdoors, from urban to woodland, and indoors in warehouses and mines - without changing parameters. Our method integrates motion compensation within a sweep with one-to-many scan registration that minimizes distances between nearby oriented surface points and mitigates outliers with a robust loss function. Extending our previous approach CFEAR, we present an in-depth investigation on a wider range of data sets, quantifying the importance of filtering, resolution, registration cost and loss functions, keyframe history, and motion compensation. We present a new solving strategy and configuration that overcomes previous issues with sparsity and bias, and improves our state-of-the-art by 38%, thus, surprisingly, outperforming radar SLAM and approaching lidar SLAM. The most accurate configuration achieves 1.09% error at 5Hz on the Oxford benchmark, and the fastest achieves 1.79% error at 160Hz.

translated by 谷歌翻译

Analysis and prediction of heart stroke from ejection fraction and serum creatinine using LSTM deep learning approach

Md Ershadul Haque , Salah Uddin , Md Ariful Islam , Amira Khanom , Abdulla Suman , Manoranjan Paul

分类：计算机视觉 | 机器学习

2022-09-28

大数据和深度学习的结合是一项破坏世界的技术，如果正确使用，可以极大地影响任何目标。随着深度学习技术中大量医疗保健数据集和进步的可用性，系统现在可以很好地预测任何健康问题的未来趋势。从文献调查中，我们发现SVM用于预测心力衰竭的情况，而无需关联客观因素。利用电子健康记录（EHR）中重要历史信息的强度，我们利用长期记忆（LSTM）建立了一个智能和预测的模型，并根据该健康记录预测心力衰竭的未来趋势。因此，这项工作的基本承诺是使用基于患者的电子药用信息的LSTM来预测心脏的失败。我们已经分析了一个数据集，该数据集包含在Faisalabad心脏病学研究所和Faisalabad（巴基斯坦旁遮普邦）的盟军医院收集的299例心力衰竭患者的病历。这些患者由105名女性和194名男性组成，年龄在40岁和95岁之间。该数据集包含13个功能，这些功能报告了负责心力衰竭的临床，身体和生活方式信息。我们发现我们的分析趋势越来越多，这将有助于促进心中预测领域的知识。

translated by 谷歌翻译